A Restoration and Segmentation Unit for the Historic Persian Documents
Identifieur interne : 001352 ( Main/Exploration ); précédent : 001351; suivant : 001353A Restoration and Segmentation Unit for the Historic Persian Documents
Auteurs : Shahpour Alirezaee [Iran] ; Shayesteh Fard [Iran] ; Hassan Aghaeinia [Iran] ; Karim Faez [Iran]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.
Descripteurs français
- Pascal (Inist)
- Wicri :
- geographic : Iran.
English descriptors
- KwdEn :
Abstract
Abstract: This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.
Url:
DOI: 10.1007/11558484_85
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001883
- to stream Istex, to step Curation: 001785
- to stream Istex, to step Checkpoint: 000C61
- to stream Main, to step Merge: 001388
- to stream PascalFrancis, to step Corpus: 000425
- to stream PascalFrancis, to step Curation: 000362
- to stream PascalFrancis, to step Checkpoint: 000439
- to stream Main, to step Merge: 001469
- to stream Main, to step Curation: 001352
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Restoration and Segmentation Unit for the Historic Persian Documents</title>
<author><name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
</author>
<author><name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
</author>
<author><name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
</author>
<author><name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11558484_85</idno>
<idno type="url">https://api.istex.fr/document/25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001883</idno>
<idno type="wicri:Area/Istex/Curation">001785</idno>
<idno type="wicri:Area/Istex/Checkpoint">000C61</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Alirezaee S:a:restoration:and</idno>
<idno type="wicri:Area/Main/Merge">001388</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:06-0001215</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000425</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000362</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000439</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Alirezaee S:a:restoration:and</idno>
<idno type="wicri:Area/Main/Merge">001469</idno>
<idno type="wicri:Area/Main/Curation">001352</idno>
<idno type="wicri:Area/Main/Exploration">001352</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Restoration and Segmentation Unit for the Historic Persian Documents</title>
<author><name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Islamic Azad University of Abhar, Abhar</wicri:regionArea>
<wicri:noRegion>Abhar</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Zanjan University, Zanjan</wicri:regionArea>
<wicri:noRegion>Zanjan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Amirkabir University of Technology, Hafez Ave., Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Amirkabir University of Technology, Hafez Ave., Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB</idno>
<idno type="DOI">10.1007/11558484_85</idno>
<idno type="ChapterID">85</idno>
<idno type="ChapterID">Chap85</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Document analysis</term>
<term>Document processing</term>
<term>Iran</term>
<term>Mathematical morphology</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Analyse documentaire</term>
<term>Extraction forme</term>
<term>Iran</term>
<term>Langage iranien</term>
<term>Morphologie mathématique</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Restauration document perse</term>
<term>Segmentation document perse</term>
<term>Texte</term>
<term>Traitement document</term>
</keywords>
<keywords scheme="Wicri" type="geographic" xml:lang="fr"><term>Iran</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.</div>
</front>
</TEI>
<affiliations><list><country><li>Iran</li>
</country>
</list>
<tree><country name="Iran"><noRegion><name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
</noRegion>
<name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
<name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
<name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001352 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001352 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB |texte= A Restoration and Segmentation Unit for the Historic Persian Documents }}
This area was generated with Dilib version V0.6.32. |